Sequence determination from overlapping fragments: a simple model of whole-genome shotgun sequencing.

نویسندگان

  • Bernard Derrida
  • Thomas M A Fink
چکیده

Assembling fragments randomly sampled from along a sequence is the basis of whole-genome shotgun sequencing, a technique used to map the DNA of the human and other genomes. We calculate the probability that a random sequence can be recovered from a collection of overlapping fragments. We provide an exact solution for an infinite alphabet and in the case of constant overlaps. For the general problem we apply two assembly strategies and give the probability that the assembly puzzle can be solved in the limit of infinitely many fragments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for DNA Sequence Assembly

Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pairwise fragment overlap. While shotgun sequencing infers a DNA sequence given ...

متن کامل

Mega Weaver: A Simple Iterative Approach for BAC Consensus Assembly

Hierarchical genome assembly can be divided into three distinct stages: sequencing and assembling shotgun reads for each of a series of selected BAC clones; assembling the resulting fragments into BAC consensus sequences; and mapping and orienting the BAC consensus according to external positional information. We report a new approach for BAC consensus assembly that relies on iterative layouts ...

متن کامل

Efficiently detecting polymorphisms during the fragment assembly process

MOTIVATION Current genomic sequence assemblers assume that the input data is derived from a single, homogeneous source. However, recent whole-genome shotgun sequencing projects have violated this assumption, resulting in input fragments covering the same region of the genome whose sequences differ due to polymorphic variation in the population. While single-nucleotide polymorphisms (SNPs) do no...

متن کامل

A Preprocessor for Shotgun Assembly of Large Genomes

The whole-genome shotgun (WGS) assembly technique has been remarkably successful in efforts to determine the sequence of bases that make up a genome. WGS assembly begins with a large collection of short fragments that have been selected at random from a genome. The sequence of bases at each end of the fragment is determined, albeit imprecisely, resulting in a sequence of letters called a "read....

متن کامل

Strategies and Clinical Applications of Next Generation Sequencing

Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput se­quencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Physical review letters

دوره 88 6  شماره 

صفحات  -

تاریخ انتشار 2002